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Abstract — In this paper the relation between nonanticipative 
rate distortion function (RDF) and Bayesian filtering theory 
is investigated using the topology of weak convergence of 
probability measures on Polish spaces. The relation is es- 
tablished via an optimization on the space of conditional 
distributions of the so-called directed information subject to 
fidelity constraints. Existence of the optimal reconstruction 
distribution of the nonanticipative RDF is shown, while the 
optimal nonanticipative reproduction conditional distribution 
for stationary processes is derived in closed form. The realiza- 
tion procedure of nonanticipative RDF is described, while an 
example is introduced to illustrate the concepts. 

I. INTRODUCTION 

This paper is concerned with the abstract formulation 
of nonanticipative rate distortion function (RDF) on Polish 
spaces (complete separable metric spaces) and its relation to 
filtering theory. In the past, rate distortion (or distortion rate) 
functions and filtering theory have evolved independently. 
Specifically, classical RDF addresses the problem of recon- 
struction of a process subject to a fidelity criterion without 
much emphasis on the realization of the reconstruction 
conditional distribution via nonanticipative operations. On 
the other hand, filtering theory is developed by imposing real- 
time realizability on estimators with respect to measurement 
data. Specifically, least-squares filtering theory deals with the 
characterization of the conditional distribution of the unob- 
served process given the measurement data, via a stochastic 
differential equation which depends on the observation data 
[1] via nonanticipative operations. 

Although, both reliable communication and filtering (state 
estimation for control) are concerned with the reconstruction 
of processes, the main underlying assumptions characterizing 
them are different. 

Historically, the work of R. Bucy [2] appears to be the 
first to consider the direct relation between distortion rate 
function and filtering, by carrying out the computation of 
a realizable distortion rate function with square criteria for 
two samples of the Ornstein-Uhlenbeck process. The work 
of A. K. Gorbunov and M. S. Pinsker [3] on e-entropy 
defined via a nonanticipative constraint on the reproduction 
distribution of the RDF, although not directly related to 
the realizability question pursued by Bucy, computes the 
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nonanticipative RDF for stationary Gaussian processes via 
power spectral densities. 

The objective of this paper is to investigate the connection 
between nonanticipative RDF and filtering theory for general 
distortion functions and random processes on abstract Polish 
spaces using the topology of weak convergence. The connec- 
tion is established via optimization of directed information 
[4] over the space of conditional distributions which satisfy 
an average distortion constraint. 

The main results discussed in this paper are the following. 

(1) Existence of optimal reconstruction distribution 
minimizing directed information using the topology 
of weak convergence of probability measures on 
Polish spaces; 

(2) Closed form expression of the optimal reconstruc- 
tion conditional distribution for stationary pro- 
cesses; 

(3) Realization procedure of the filter; 

(4) Example to demonstrate the realization of the filter. 

Motivation. This work is motivated by applications in 
which estimators are desired to have specific accuracy, 
such as processing information from sensor networks [5], 
and by control over limited rate communication channel 
applications [6], [7]. It is important to note that over the 
years several papers have appeared in the literature utilizing 
information theoretic measures for estimator and control 
applications [8], [9]. First, we give a brief high level 
discussion on nonanticipative RDF and filtering theory, and 
discuss their connection. 

Consider a discrete-time process X n = {Xo, X\, . . . , X n } £ 
X ,n = XiLo-^i' an d its reconstruction Y n = 
{Y ,Y u ...,Y n } e y Q ,„ = xf =o y, where X t and y, 
are Polish spaces. 

Bayesian Estimation Theory. In classical filtering, one is 
given a mathematical model that generates the process X n , 
{P Xi \x i - 1 {dxi\x % ~ 1 ) : i — 0,1,..., n}, often induced via 
discrete-time recursive dynamics, a mathematical model that 
generates observed data obtained from sensors, say, Z n , 
{P Zi \z>-i,x> (dzilz*- 1 ,!*) : i = 0,1,..., n}, while Y n 
are the causal estimates of some function of the process X n 
based on the observed data Z n . The classical Kalman Filter 
is a well-known example, where Xi = E[J*Q|.Z* _1 ], i = 
0,1,..., n, is the conditional mean which minimizes the 
average least-squares estimation error. Thus, in classical 
filtering theory both models which generate the unobserved 
and observed processes, X n and Z n , respectively, are given 



a priori. Fig. 1 is the block diagram of the filtering problem. 
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Fig. 1. Filtering problem. 

Nonanticipative Rate Distortion Theory and Estima- 
tion. In nonanticipative rate distortion theory one is 
given a distribution for the process X n , which induces 
{P Xi \x i - 1 {dxi\x' l ~ 1 ) : i = 0,1,..., n}, and determines 
the nonanticipative reconstruction conditional distribution 
{PY i \Y i - 1 ,x i (dyi\y' l ~ 1 ,x l ) : i = 0, 1, ...,n) which mini- 
mizes the directed information from X n to Y n subject to dis- 
tortion or fidelity constraint. The filter {Yi : i = 0, 1, . . . , n) 
of {Xi : i = 0, 1, . . . , n} is found by realizing the optimal 
reconstruction distribution {Py i \x i - 1 ,x i (rfj/i | J/ Z 1 , x l ) : i = 
0, 1, ... , n} via a cascade of sub-systems as shown in Fig. 2. 
Thus, in nonanticipative rate distortion theory the observation 
or mapping from {Xi : i = 0, 1, . . . , n} to {Zi : i — 
0,1,..., n} is part of the realization procedure, while in 
filtering theory, this mapping is given a priori. Indeed, this 
is the main difference between Bayesian estimation theory 
and nonanticipative RDF for the purpose of estimation. 
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Fig. 2. Filtering via nonanticipative rate distortion function. 

The precise problem formulation necessitates the definitions 
of distortion function or fidelity, and directed information. 
The distortion function or fidelity between x n and its recon- 
struction y n , is a measurable function defined by 



rfo.n : <*b,n x yo,n -> [0,oo], d . n {x n ,y n ) = ^ 



i=0 



The directed information between X" and Y n , for a 
given distribution Px^{dx n ), and conditional distribution 
P Ynlxn (dy n \x n ), is defined by [lof] 

n 

I(X n -> Y n ) = Y^- 1 ) 



n „ 



PY i \Y-\X'{dy l \y t 1 ,x i ) 
PyUYi-iidyM- 1 ) 



P x * tY i{dx\dtf) 

(1) 

= Ix«^Y»{Px i \xi-i,Y*- 1 ,PY i \Y*-i,x* : i = 0,1,..., n). 

(2) 

The notation ^x n -yY n ['i *) illustrates the dependence 
of I(X n —> Y n ) on the two sequences 

4 Unless otherwise, integrals with respect to probability distributions are 
over the spaces on which these are defined. 



of nonanticipative conditional distributions 
{Px^x'-^Yt-i (■]■,■), JV 4 |y«-i,x* (•]•)*) : i = 0,l,...,n}. 
In information theory, directed information Ex»->^»(") •) 
is often used as an information theoretic measure which 
describes the directivity of information flow via a sequence 
of channel outputs, defined by a nonanticipative sequence 
of feedback conditional distributions iV 4 iy-<- I ,.x'*("| , > •) an d 
feedforward conditional distributions Px' < |x«- 1 ,jr*- 1 ('|v)' 
i = 0,1,..., n. Directed information is also used in 
biological applications [11], [12] as a measure of causality, 
describing the cause and effect. 
In this paper, it is assumed that V i = 0, 1, . . . ,n 



P 



X i \X i ~ 1 ,Y i 



'- {dxi\x l 



- 1 ,y i - 1 ) = P Xi \ x i-i{dx i \x i - 1 )-a.s. 



(3) 



The above assumption states that the process {Xi : i — 
0,1,... , n} is conditionally independence of Y 1 ^ 1 = y 1 ^ 1 
given knowledge of X 1 ^ 1 = x l ~ x . Clearly, ^ is implied 
by the following conditional independence, Py^Y^ 1 ^^ 
(dyily*- 1 , x°°) = P y .\yi--L >xi {dy i \y i - l ,x i ) - a.s., Vi = 
0,1,..., n. The last assumption implies that the recon- 
struction of Yi does not depend on future values X^ 1 = 
{Xi+i, Xi + 2, . . . , Xrx,}, stating that Yi is nonanticipative 
with respect to the process {Yi : i = 0, 1, . . . , n). 
Given a probability distribution Px^{dx n ) and a sequence 
of conditional distributions {Pyjy.-i^i : i = 0, 1, . . . ,n) 
the directed information in the definition of nonanticipative 
RDF is given by 



I(X r 



Y n ) = 



■X" 



t (Pxn,P YilY i-i t x* ■ * = 0,1, 



..,ra). 
(4) 



Nonanticipative Rate Distortion Function. The nonanticipa- 
tive RDF is defined by 



A 



inf 

^Y i \Y i — 1 ,X i 
:E{d .„(X 



i=0,l,. ..,r, 

y")<d} 



lx n ^-Y" {Px™ ! ^V 4 |y*-i,x 



)■ 



(5) 

The definition of the nonanticipative RDF is consistent with 
[3] in which non-anticipation is defined via the Markov 
chain X™ +1 <-> X n o Y n , e.g., P yn \ Xaa {dy n \x°°) = 
PY n \x n (dy n \x n ). Therefore, by finding the solution of |5jl, 
then one can realize it via a channel from which one can 
construct an optimal filter via nonanticipative operations as 
in Fig. [2] 

This paper is organized as follows. Section III] discusses 
the formulation on abstract spaces. Section |Hl| establishes 



existence of optimal minimizing distribution, and Section IV 
derives the optimal minimizing distribution for stationary 
processes. Section |V| describes the realization of nonantici- 
pative RDF, while Section [VI] provides an example. Lengthy 
proofs are omitted due to space limitations. 

II. Abstract Formulation 

The source and reconstruction alphabets are sequences 
of Polish spaces [13] as defined in the previous section. 
Probability distributions on any measurable space (Z, B(Z)) 



are denoted by M X {Z). For (X, B(X)), (y, B(y)) 
measurable spaces, the set of conditional distributions 
Py\x{'\X — x) is denoted by Q(y;X) and these are 
equivalent to stochastic kernels on (y, B(y)) given 
(X, B(X)). 

Given the process distributions P X n{dx n ) and 
{P Y .\ Yt -i tX i(dy l \y l ~ 1 ,x l ) ■ i = 0, l,...,n} the following 
probability distributions are defined. 

(PI): The reconstruction conditional probability distribution 

p' Y , llXn (dy n \x n ) = ®? =0 PY lY ,- ltX <(dy l \y*-\x t )~a.s. 

(6) 

(P2): The joint probability distribution Px n .Y n G 
Mx(yo, n x X 0in ) for G ,„ € B(^b,n) x B(y 0>n ): 

Pjc»,y*(G 0)tt ) = (fx- ®^Y-|X")(Go,„) 



let ^ 

o,n(-D) (assuming is non-empty) denotes the average 
distortion or fidelity constraint defined by 

3o,»(0) = {^y-ix- € ~&(y , n ;X 0!n ) : £ d0i „(^ r n, x „) 
do,n(a: n ,tf n )^v»|x»(dy n |a; n )®P X :»(d!C n ) < d} 



(8) 



where D > 0. The nonanticipative RDF is defined by 



A 



R c o, n (D)= inf , x ,^ Y r 

?y»|x»6Qo,»(D) 



(P^pV-MX")- 



where Go n x n is the x™— section of Go jM at point x n defined 

by G , n ,x» = {y n G yo,n : (z™, y n ) G G ,„} and ® denotes 
the convolution. 

(P3): The marginal distribution Pyn g -MiQ^cn): 
Py-(Fo,n) = P(#o,n x Fo,n), F^ n G B(y 0i „) 

^y»|x-((*b,n x Jb J «) a »;a: n )-Pjc»(dic n ) 

The set of all (n + l)-fold such convolution distributions is 
defined by 

(dy n \x n ) G Q{y 0,n, Xo,n) ■ 
~P* Y n lX n(dy n \x n ) = ^ =0 PY\Y^.x^dy l \y 1 - 1 ,^) - a.s.}. 

Directed information is defined via the Kullback-Leibler 
distance: 

I(X n -> Y n ) = B(P X n !Yn \\P X n X Py„) 
= 0(P X „ ®~P* Y n lX n\\P X n X Py„ ) 

r ( d(p Xn ®^ xn]Yn ) ^ -0 
= J log { d(P x ^P Y ,) )^® ? *»|r») 



,(P X „,^ r n| X n). 



(7) 



Note that |7]) states that directed information is expressed as 

a functional of {Px™, ~^Y n \x n }- 

Next, the definition of nonanticipative RDF is given. 

Definition 1: (Nonanticipative Rate Distortion 
Function) Suppose d 0<n = S"=o Po,i( xl > where 
Po,i : ^0,1 x 3^o,i —> [0, oo), is a sequence of 
B{X 0yi ) x S(3^o, i ) -measurable distortion functions, and 



(9) 

Clearly, Rq n (D) is characterized by minimizing directed 
information or equivalently I X n^. Y n(P X n, ~P Yn \ X n) over 

<Jo,n(£>). 

III. Existence of Reconstruction Conditional 
Distribution 

In this section, the existence of the minimizing (n + 
l)-fold convolution of conditional distributions in ^ is 
established by using the topology of weak convergence 
of probability measures on Polish spaces. Before we 
present the relevant results we state some properties of 
average distortion set Qo n (D) an d directed information 

Ix>»^y»(Px™, ^y«|x™)- These properties are derived in 
[14]. 

Theorem 1: [14] Let {X n : n G N} and [y n : n G N} 
be Polish spaces. Then 

(1) The set ~&(y 

,n': <^o,n) is convex. 

(2) l X n^ Y n(P X n,~P Y ni X n) is a convex functional of 
Py»|x« G ~Q(y , n ;Xo,n) for a fixed P x ^ G 

(3) The set Qo,n(D) is convex. 

Let BC (3^o, n) denotes the set of bounded continuous real- 
valued functions on J^o.n- Below, we introduce the main 
conditions for establishing existence of nonanticipative RDF 

Assumption 1: The following conditions are assumed 
throughout the paper. 

(1) y^.n is a compact Polish space, X$ :n is a Polish 
space; 

(2) for all h(-)€BC(y 0in ), the function 
mapping (x 71 ,^ 1 - 1 ) G X ^ n x y . n -i >-» 
J yn h(y)P YlYn -^ X n(dy\y n - 1 ,x n ) G R 
is continuous jointly in the variables 

G Xq j71 x 34),n-i; 

(3) do yn (x n , •) is continuous on 3^0,™.; 

(4) the distortion level D is such that there ex- 
ist sequence (x n ,y n ) G Ao.„ x 3^ ,n satisfying 
do, n (z n ,2/ n ) <X>. 

Note that since 34). n is assumed to be a compact Polish 
space, then by [13] probability measures on 3^o,n are weakly 
compact. Moreover, the following weak compactness result 
can be obtained, which will be used to show existence of an 
optimal nonanticipative RDF, P§ „(£>). 



Lemma 1: Suppose Assumption [T] (1), (2) hold. 
Then 

The set £^(3^0, «; Xo,n) is weakly compact. 
Under the additional conditions (3), (4) the set 
~($ Q n (D) is a closed subset of Q^o,™; <%b,n) 
(hence compact). 
Proof: (1) This follows from the fact that any 
P Y n\ X n{dy n \x n ) G S(%,n;^o.«) is factorized 



(1) 

(2) 



as 



where Pi 



Y^Y'- 1 ^ 



(dVilv* 



{dyi\y l ,x J )-a.s.. 



1 < i < n, and 3^o 

,,4-1 



compact Polish space implies that 
{Py.iyi-i^Oly 4 - 1 ,^) : y 1 - 1 € iVo.i-i,^ G A? ,i} is 
compact, hence by Prohorov's theorem it is uniformly tight 
Mi. Utilizing this, by induction it can be shown that the family 
of convolution measures Q (D4),nj <^b,n) i s compact. 

(2) Utilizing compactness of y (3^o,n) %o,n) ar, d condition 

(3) of Assumption [T] on do tn (x n , •), it can be shown that 
~$ JD) is a closed subset of 3q> 

The previous results follow from Prohorov's theorem that 
relates tightness and weak compactness. 

The next theorem establishes existence of the minimizing 
reconstruction kernel for |9]); it follows from Lemma [T] and 
the lower semicontinuity of Ix n ->Y n (Px n , ■) with respect to 

^Y^Xn. 

Theorem 2: Suppose the conditions of Lemma [T] hold. 
Then Rq n (D) has a minimum. 

Proof: The proof is omitted due to space limitations. 



IV. Optimal Reconstruction of Nonanticipative 
Rate Distortion Function 

In this section the form of the optimal reconstruction 
conditional distribution is derived under a stationarity as- 
sumption. The method is based on calculus of variations 
on the space of measures. We introduce the following main 
assumption. 

Assumption 2: The (n + l)-fold convolution condi- 
tional distribution y n \ X n(dy n \x n ) — ®2=oPy i \Y i - 1 ,x* 
(dyi\y l ~ 1 , x 1 ) — a.s., is the convolution of stationary con- 
ditional distributions. 

Assumption [2] holds for stationary process {(Xi,Yi) : 
i G N} and A>,i = p(T i x n ,T i y n ), where T l x n 

is the shift operator on x n (and similarly for T l y n ). The 
consequence of Assumption |2j which holds for stationary 
processes and a single letter distortion function, is that 
the Gateaux differential of Ix^-yY" (Px n , ~PY»\x n ) is done 
in only one direction (since PY i \Y i - 1 ,x i (dyi\y l ~ 1 ,x l ) are 
stationary). Therefore, we define the variation of Py^lx™ 
in the direction of Pyn|x« — Pynix" wx& Py"\X" ^ 

~ ~P*Y»\X n )> 6 G P' 1 !' sinCe under 

Assumption |5J the functionals {P Yi \Y i - 1 ,X i (dyi\y i ^ 1 , x l ) € 
Q(3^; y<i,i-i x A? ,i) : i = 0, 1, . . . , n} are identical. 

Theorem 3: Suppose Assumption [2] holds 

and I Pxn (P Y n lX n) = I X ^Yn{Pxn,%n\X») 

is well defined for every Py«|x™ G Qo,n(D) 



possibly taking values from the set [0, oo). Then 

~P* Y »\X™ -> lp x n(P Y n \x n ) is Gateaux differentiable 

at every point in Qo, n (D), and the Gateaux derivative at 

the point ~^ Y "\x n i n me direction ~Py n \x n ~ Py^lx™ i s 
given by 

'^ Yn{xn (dy n \x n )\ 



log 



P Y ~W 



® (?Y*\x« - ^° Y ^)(dy n \x n )PxAdx n ) 

where P yn <G A4i(3^o,n) is the marginal measure corre- 
sponding tO P^„| X „ ® Px™ G MiO'o.n x A'o.n). 

Proof: The proof is similar to the one in [15] (although 
it is more involved). ■ 

The constrained problem defined by (|9]l can be refor- 
mulated as an unconstrained problem using Lagrange mul- 
tipliers. The equivalence of constrained and unconstrained 
problems is established next. 

Lemma 2: Suppose Assumptions [T[ [2] hold and consider 

do, n (z n ,J/ n ) = E?=oP(^ n .r < » n ). where d o,n : X 0<n x 
3^o, n — > Rq = [0, oo] is continuous in the second argument. 
Then the constrained problem as stated in Theorem [2] is 
equivalent to an unconstrained problem stated below. 



inf Ix^^fY' 

i*y«|X»e4o,n(-D) 



max inf 



(Px«,^r»|x«) 



max inf {l x ™^Y n 

- ?y»|x»eQ(yo,»;Ab,n) 

«( / do ) n(^",2/ n )^m|xn(^ n k")®Px«(^ n ) -£>)}. 



Further the infimum occurs on the boundary of the set 

Proof: The proof utilizes the Lagrange duality theorem 
[16]. ■ 
Utilizing Lemma [2j then 

n\xn) 

-s(£doJ^Y«\x^-D)}. (10) 

Note that Pk«|x™ G Q (D^o,n! <^b,n) are probability mea- 
sures on 3^o, ti therefore, one should introduce another set of 
Lagrange multipliers to obtain an unconstrained problem free 
of such a constraint. 

Since P Y n\xn(dy n \x n ) = ®? =0 P Yi \Y*-i .x'^Will/* -1 , **) 
is a consistent probability measure on 3^0, then for each 
k = 0, 1, ...,n, Jy o ~P > Y k \x k {dy k \x k ) = 1. This constraint 



is expressed via 

Ki^ ,y i ~ 1 )(j^Y i \x*(dy i \x i ) - l^Px^idx 1 ) 



£=0 



n p 

= J2 \( xi iV^ 1 )^ Y n \x^(dy n \x n ) — lJPx"(dx n ) 

i=0 ^ 

(ID 

where {Aj(-, •) : j = 0, 1, . . . , n} are Lagrange multipliers. 
The above observations yield the following theorem. 

Theorem 4: Suppose the Assumptions of Lemma [2] hold 
and consider d , n (x n , y n ) = £™ =0 p(T l x n , T l y n ). Then 
(1) The infimum in ( 10 1 is attained at i^y„| X „ €~Qo, n (D) 
given 

> Yn{xn (dy n \x n ) = ®UPh Y ^MW^ xl ) 



n}, a 
n} is 



(12) 



4 e""^^*")^,^-!^^- 1 ) 

where s < and P^-x (dj/ily* -1 ) G COk^i-i). 
(2) The nonanticipative RDF is given by 



the sequence X n to the sequence F" matches the nonan- 
ticipative rate distortion minimizing reconstruction kernel. 
Fig. [3] illustrates the cascade sub-systems that realize the 
nonanticipative RDF, which is consistent with the discussion 
in the introduction. 

Definition 2: Given a source 

{P Xl \x^,Y^(dx t \^-\y^) : i = 0, 
channel {P B .\ B i-i At (db l \b l ~ l ,a l ) : i = 
a realization of the optimal reconstruction distribution 
if there exists a pre-channel encoder {PA i \A i -\B i -\x > 
(da i \a 1 ~ 1 , x l ) : i = 0, ...,n} and a post-channel 
decoder {Py.iyi-i .B^dyily 1-1 , b l ) : i — 0, . . . , n} such that 

T Y n lX n(dy n \x n ) = 2>U^ i \Y*-^(dy i \^W)-a.a. 
where the joint distribution is 

Px-,A-,B^Y^dx n ,da n ,db n ,dy n ) 

= ^ PY i \Y^,B^dy i \y i - 1 ,b i )^P BilBi - 1>Ai (db i \b i - 1 ,a i ) 

J—l ui— 1 



Px^xi-i^-iidx^x 1 - 1 ,y % ~ L ) ~ a.s. 



^A^-KB^-Kx^d^la 1 \b l 



The filter is given by {P x .\ B i-i{dxi\b l ) : i = 0, ...,n} 



Rl n (D) = S D-^ 

i=Q 

i-1 



log( 



sp(T i x",T i y n ) 



or by {P x , 



\Y*- 



t(dxi\y l 



Y,\Y*-ddyi\y 

If R c Q , n (D) > then s < and 



(13) 



| Source | 



x t ,x v ... 






1 ' 









0,...,n}. 

» | Channel | ^ l,T "^ lr " 



n 

E 

i=0 



P 



iilr*- 1 ^' 



(14) 

Proof: The fully unconstrained problem of (lOi is 
obtained by introducing another set of Lagrange multipliers 
{Aj(-,-) : i = 0,1,..., n.} as in (111. The derivation is 
omitted due to space limitations. ■ 
Remark 1: Note that if the distortion function satisfies 
^(TV 1 , T l y n ) = p(x i ,T i y n ) then for i = 0, 1, . . . , n 

= P Yi \Y^ xA d Vi\y 1 ~ 1 ,x l )~a.s. 

(15) 

that is, the reconstruction kernel is Markov in X n . However, 
without further restriction one cannot claim that this condi- 
tional distribution is also Markov with respect to {3^ : i = 
0,1,... ,n}. 

V. Realization of Nonanticipative Rate 
Distortion Function 

The realization of the nonanticipative RDF (optimal recon- 
struction conditional distribution) is equivalent to the sensor 
mapping as shown in Fig. [2] which produces the auxiliary 
random process {Zi : i £ N} that will be used for filtering. 
This is equivalent to identifying a communication channel, 
an encoder and a decoder such that the reconstruction from 
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Kernel 



5 Due to stationarity assumption Py. \yi-i (*|*) 



P(-|-) and 



Fig. 3. Realizable nonanticipative rate distortion function. 

Thus, if {P Bi \ B i ~ 1 ,A i {dbi\b % ~ 1 , a 1 ) : i = 0, ...,n} is a 
realization of the nonanticipative RDF minimizing distri- 
bution then the channel connecting the source, encoder, 
channel, decoder achieves the nonanticipative RDF, and the 
filter is obtained. Clearly, {P>i : i = 0, 1, . . . , n} is an 
auxiliary random process which is needed to obtain the filter 
{P x . lB i-i(dx i \b i - 1 y.i = 0,...,n}. 

In the next section, we provide an example for such a 
realization. 

VI. Example 

Consider the following discrete-time partially observed 
linear Gauss-Markov system described by 

X t+1 = AX t +BW U X =X, i g N n 
Y t = CX t + DV t ,, t G N n ( ' 

where X t G M m is the state (unobserved) process of 
information source (plant), and Y t G M. p is the partially 
observed (measurement) process. Assume that (C, A) is 
detectable and (A, BB tr )^ is stabilizable, (D ^ 0). The state 
and observation noise {(Wt,Vt) : t G N™} are mutually 
independent, independent of the Gaussian RV Xq, with 



*(•!•,•) = -p*(-M 



parameters N(xo,Vq), where W t G 



and V t G 
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Fig. 4. Communication system. 



are Gaussian IID processes with zero mean and identity 
covariances. 

The realization will be done following Fig. |4] The objective 
is to reconstruct {Y t : t G N"} by {Y t : t G N"} 
via nonanticipative operations. The distortion is single letter 
defined by 



do,n(v n ,V n : 



A 



1 



i=0 



The objective is to compute 



inf '. 



■X 1 ' 



\Vi ~ Vi 



,~. 1 1 2 



and then realize the reconstruction distribution. The filter 
realization procedure is similar to the one found in recon- 
struction of {X t : t G N"} in [17]. The methodology 
however, is based on the explicit formulae of optimal re- 
construction of Theorem |4] According to Theorem [4] the 
optimal reconstruction is given by 



,s\\i)i-Vi\\ 



y 4 |y* 



(17) 



^ Yn]Y ,Mv n \v n ) = ®U 

where s < 0. Hence, from 

• p y 4 |Y-<-i,y 4 (dj/i|S/ t-1 ^i) _a - s 
Markov with respect to the process {Yi : i G N n }. Moreover, 

since the exponential term | — 1 1 2 in the RHS of ( 17 1 is 



17 1 it follows that Pv> 



Y i \Y i - 1 ,Y i ~ 

that is the reconstruction is 



quadratic in (yi,iji), and {Xi : i G N n } is Gaussian, then 
{(Xj, Yi) : i G N"} is jointly Gaussian, hence it follows that 
Py-Ayi-i y Vi) i s Gaussian (for a fixed realization of 

{y l ~ 1 iUi))- Hence, it has the general form 



Y t = AY t + BY*- 1 +Z t , t G N" 



based on the block diagram of Fig. [5] The encoder $t(-, ■) 
consists of a pre-encoder which produces the Gaussian 
innovation process {K t : t G N"}, defined by 



(20) 



= E{K t Kt r }. The 
t G N™} which 



K. 



whose covariance is defined by A t 
decoder consists of a pre-decoder {K t : 
is defined by 

K t = Y t -E{Y t \a{Y t - 1 }}, t€ 

Note that the fidelity criterion satisfies do n (y n ,y n ) — 
d ,n(k n ,~k n ) = ^T.toWk - fc,|| 2 . Let {E t : t G N} 
be the unitary matrix that diagonalizes {A f : t G N ra }, such 
that 



(21) 



E t K t Et r = diag{X ttl , . . . A t)P }, t G N n . 
Choose {6 : i G N™} such that 
- I i, if 



(22) 



6 



if 



6 < A t)4 
6 > A M 



i G N", i = I, 
D. 



where {& : te N™} satisfies £? =1 <*t,« 
Define T t = E t K t . Then {T t : i G N"} is an orthogonal 
process. Let {Tt ■ t € N™} denote its reconstruction and 
define d , n (T n ,f n ) = ^ £JL \\ T i - ^ll 2 - Then by [18], 

inf 

=.n| r „: 

s{do,«(r ra ,f n )<£)} 



^ " 



, ^ r™ |r™ / 



has a solution 



^ =0 P* Afi {d^i)-a.s. 



A 



where P* ~ N(r) t ,iT t ,r)t,i5 t ,i), r?t 

0, 1, . . . ,p, and i^'^p) - Eti log 



(1- 



% = 
Thus, 



S t ,i ■ 
A t(t < 

n+1 ^i=l ~e> V a,,, 
the pre-encoder can be further scalled by T t = E t K t , and Tt 
is compressed by A t = At^t and sent through an additive 
white Gaussian noise (AWGN) channel with feedback, after 
which the received signal is decompressed by T t — B t B t in 
the pre-decoder. By the knowledge of the channel output at 
the decoder, the mean square estimator X t is generated at 

A 



where A t G R pxp , B t G R pxtp , and {Z t : t G W 1 } is an 
independent sequence of Gaussian vectors. The channel in 
( fT8| l can be realized as follows. 

The communication channel ( p"8j ) can be realized via a scalar 
additive Gaussian noise channel with feedback defined by 

B t =A t + Z u tei" (19) 

where the encoder is a mapping A t = Q t (Y t ,Y t ~ l ) 

with power P t = Tr{E{(A t ) 2 }} . For A t Gaussian 
the directed information is I(A t — > B l ) = log 1 1 + 
E{(A t ) 2 }Cov(Z t )- 1 \. The decoder at time t G N" receives 
B l and computes the reconstruction Y t = ^tiB 1 ,Y t ~ 1 ). 

Realization of the nonanticipative RDF. The realization is specific AWGN channel. 



(18) the decoder (and encoder because X t = E{Xt\a{Yt-i}\). 
The complete design is illustrated in Fig. [5] Next we pick a 




Fig. 5. Design of the discrete-time communication system with scalar 
additive white Gaussian noise (AWGN) channel. 



Scalar AWGN Channel. Consider a scalar channel B t = A t + 
Z t , t £ N", where Z t is Gaussian zero mean, Q = Cov(Z t ), 
and A t g R. We can design {(A t ,B t ) : t g N"} by 



B t = 



laiPt 



, t e N ri 



y At,i y A* 
V ai-Pt At,i, . . . , y a p Pt\t, P 



where £)f =1 a 4 = 1, i = 1, . . . ,p. 
Note that 



fft = B t At 

y/axPtXt,!, ■ ■ ■ , \/a p P t \ t . 



tr 


r /^iP 


a P Pt~ 




-V At,i 





•^/aiPtA*,! 

\/ OtpPtXt,p 
Oil 



laiPt 
Ao 



/apP t 



P 



Therefore, 

f t = PtP fJ FT t + BtZ t , r ( = P t *T t , t e N™. (23) 

By pre-multiplying T t by P* r we can construct 

K t = Pff t 

= E\ r H t E t K t + El r BtZ t , t g N". 



The reconstruction of Yf is given by the sum of it t and CX 4 
as follows. 



Y t = ^(B*,?*- 1 ) 

= if* + CX t) X t = pjXtKf *-!}} (24) 
= El r H t E t K t + E\ r B t Z t + CX t , t g N n . (25) 



Next, it will be shown that the desired distortion is achieved 
by the above realization while the filter of {Y t : t g N™} is 
based on {Y t : t € N™} given by ( [25] , 
First, we notice that 



P 



{(y t - - y t )} = Pr(p{(r t - f t )(y t - Y t ) tr }) . 



Then we can compute 

E{(Y t - Y t ) tr (Y t - Y t )} = TrE[{K t - K t )(K t 
= TrE[{K t - E\ r f t ){K t - E?f t ) tr } 
= TrE{(K t - Et r H t E t K t - E?B t Z t ) 

(K t - E?H t E t K t - EfB t Z t ) tr } 



K t ) 



= TrP|((/ - EfHtE t )K t - EfB t Z t ) 

{{I-EfHtE t )K t -El r BtZ t ) tr } 

= Tr[(I - El r H t E t )A t (I ~ E?H t E t ) tr 

+ E t t r BtQB t t r Et} 

= Tr{(I - EfHtE t )El r diag{X t ,u- ■ • , X t , P ) 
E t (I - E\ T H t Et) tr + E?B t QB?E t } 
= Trjpf ((/ - H t )diag(Xt,i, . . .,A t , p )(l - P t ) ( 
+ {B t QBf))Et} 

•idiag{8 t ,x, ■ ■ - ,St, P )} 



= Tr< 



= D. 



Decoder. The decoder is Y t = K t +CX t , where X t : t g N" 
is obtained from the modified Kalman filter as follows. Recall 
that 

Y t = K t + CX t 

= Ei r H t Et{Y t - CX t ) + EfBtZt + CX t 

= EfUtEtiCXt + DV t - CX t ) + EfBtZt + CX t 

= E t r H t E t CXt — E t H t E t CX t + CX t 

+ (EfHtEtDVt + EfBtZt) 

where {V t : t g N™} and {Z t : t 6 N"} are independent 
Gaussian vectors. Then X t — P{Xt|cr{y t_1 }} is given by 
the modified Kalman filter 



X t+1 = AX t + CX t + AZt(EfHtE t Cy r M t - l Y t , X = x 
Et+i = AZ t A tr - A^t(E\ r UtE t C) tr Mt\E? H t E t C)^t A 
+ PP* 1 "' = ^0 

where 

M t = EfHtEtC^EfHtEtCT 

+ E t t r H t EtDD tr {E t t r H t Et) tr + Ef B^Bf Ef . 

Infinite Horizon. As t — > oo, under the assumption that the 
linear Gauss-Markov system is stabilizable and detectable, 
we have 

— AY* 00 {Ef H 00 E 00 C) tr {Ef H ao E 00 C)Y Joo A 
+ BBf 

where 

Moo — EfH OD E 00 CT: QO (EfH 00 E 00 C) tr 

+ Ef > H 00 E 00 DD tr (EfH 00 E 00 ) tr + P^6 00 £ 00 £>^P* r 



and E'qo is the unitary matrix that diagonalizes A^ by 



-EooAoo-E^ = diag(X OCt i, . . . , X t , p ) 



and 



£oc if 6 

-^oo,i if so 

satisfying £)? =1 = D. 
Define 



Aoo = diag(d 0Ot i, . . . ,5oo, P ), #00 = diag^^i, . . . , ?7oo,p) 



where ?7 



= 1- 



, The realizable (nonanticipative) RDF 



can be computed as follows. 

R C (D) = lim inf 



1 



^^FytlytCd/lS') * + 1 
6^o,t(D) 



n (Pyt , P ytryt) 



lim 

f— »oo 

1 p 

2 ^ 

»=i 

1 



\ i=i 




1q / Aoo,i \ 
I Aoo | 



= 2 1 ° g |A 00 |- 

The power constraint satisfies Tr{_E{(A t ) 2 }} 
P t , lim^oo P t = P. Since A t — AtE t K t the capacity is 



C= lim 



—I(A l 

t^oo f + 1 

= i log lim -A^|i + i;{(A t ) 2 }Q- 1 | 

2 t->-oo i + 1 

= J log lim -^-ii + ^^fjg- 1 ! 

2 t->oo t + 1 



1, |Ao 



R C (D). 



(26) 



Thus, for a given distortion level D, C = R C (D) is the 
minimum capacity under which there exists a realizable filter 
for the data reconstruction of {Y t : t g N} by {Y t : i € 
N} ensuring an average distortion equal to D. The filter of 

{X, : i S Nl or {Y t : i G N} is obtained for {K, : i e N} 
given by (23 1 or the auxiliary data Bi = A i (Y i ,Y 1 ^ 1 ) + 
Zi, i G N. 
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VII. Conclusion 

In this paper, the solution of the nonanticipative RDF 
is obtained on abstract spaces using the topology of weak 
convergence of probability measures. A specific example that 
realizes the optimal causal filter is discussed. 
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